!pip install lightgbm
!pip install catboost
!pip install inflection
!pip install dython
!pip install shap
Requirement already satisfied: lightgbm in c:\users\ployh\anaconda3\lib\site-packages (3.3.2) Requirement already satisfied: wheel in c:\users\ployh\anaconda3\lib\site-packages (from lightgbm) (0.37.1) Requirement already satisfied: numpy in c:\users\ployh\anaconda3\lib\site-packages (from lightgbm) (1.21.5) Requirement already satisfied: scipy in c:\users\ployh\anaconda3\lib\site-packages (from lightgbm) (1.7.3) Requirement already satisfied: scikit-learn!=0.22.0 in c:\users\ployh\anaconda3\lib\site-packages (from lightgbm) (1.0.2) Requirement already satisfied: joblib>=0.11 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn!=0.22.0->lightgbm) (1.1.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn!=0.22.0->lightgbm) (2.2.0) Requirement already satisfied: catboost in c:\users\ployh\anaconda3\lib\site-packages (1.0.5) Requirement already satisfied: plotly in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (5.6.0) Requirement already satisfied: numpy>=1.16.0 in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (1.21.5) Requirement already satisfied: pandas>=0.24.0 in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (1.4.2) Requirement already satisfied: scipy in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (1.7.3) Requirement already satisfied: graphviz in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (0.20) Requirement already satisfied: six in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (1.16.0) Requirement already satisfied: matplotlib in c:\users\ployh\anaconda3\lib\site-packages (from catboost) (3.5.1) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas>=0.24.0->catboost) (2.8.2) Requirement already satisfied: pytz>=2020.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas>=0.24.0->catboost) (2021.3) Requirement already satisfied: fonttools>=4.22.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (4.25.0) Requirement already satisfied: packaging>=20.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (21.3) Requirement already satisfied: cycler>=0.10 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (0.11.0) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (3.0.4) Requirement already satisfied: pillow>=6.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (9.0.1) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib->catboost) (1.3.2) Requirement already satisfied: tenacity>=6.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from plotly->catboost) (8.0.1) Requirement already satisfied: inflection in c:\users\ployh\anaconda3\lib\site-packages (0.5.1) Requirement already satisfied: dython in c:\users\ployh\anaconda3\lib\site-packages (0.7.1.post3) Requirement already satisfied: scikit-plot>=0.3.7 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (0.3.7) Requirement already satisfied: pandas>=1.3.2 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (1.4.2) Requirement already satisfied: numpy>=1.19.5 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (1.21.5) Requirement already satisfied: matplotlib>=3.4.3 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (3.5.1) Requirement already satisfied: seaborn>=0.11.0 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (0.11.2) Requirement already satisfied: scipy>=1.7.1 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (1.7.3) Requirement already satisfied: scikit-learn>=0.24.2 in c:\users\ployh\anaconda3\lib\site-packages (from dython) (1.0.2) Requirement already satisfied: python-dateutil>=2.7 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (2.8.2) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (1.3.2) Requirement already satisfied: cycler>=0.10 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (0.11.0) Requirement already satisfied: packaging>=20.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (21.3) Requirement already satisfied: pillow>=6.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (9.0.1) Requirement already satisfied: pyparsing>=2.2.1 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (3.0.4) Requirement already satisfied: fonttools>=4.22.0 in c:\users\ployh\anaconda3\lib\site-packages (from matplotlib>=3.4.3->dython) (4.25.0) Requirement already satisfied: pytz>=2020.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas>=1.3.2->dython) (2021.3) Requirement already satisfied: six>=1.5 in c:\users\ployh\anaconda3\lib\site-packages (from python-dateutil>=2.7->matplotlib>=3.4.3->dython) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn>=0.24.2->dython) (2.2.0) Requirement already satisfied: joblib>=0.11 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn>=0.24.2->dython) (1.1.0) Requirement already satisfied: shap in c:\users\ployh\anaconda3\lib\site-packages (0.40.0) Requirement already satisfied: scikit-learn in c:\users\ployh\anaconda3\lib\site-packages (from shap) (1.0.2) Requirement already satisfied: numpy in c:\users\ployh\anaconda3\lib\site-packages (from shap) (1.21.5) Requirement already satisfied: scipy in c:\users\ployh\anaconda3\lib\site-packages (from shap) (1.7.3) Requirement already satisfied: cloudpickle in c:\users\ployh\anaconda3\lib\site-packages (from shap) (2.0.0) Requirement already satisfied: slicer==0.0.7 in c:\users\ployh\anaconda3\lib\site-packages (from shap) (0.0.7) Requirement already satisfied: numba in c:\users\ployh\anaconda3\lib\site-packages (from shap) (0.55.1) Requirement already satisfied: tqdm>4.25.0 in c:\users\ployh\anaconda3\lib\site-packages (from shap) (4.64.0) Requirement already satisfied: pandas in c:\users\ployh\anaconda3\lib\site-packages (from shap) (1.4.2) Requirement already satisfied: packaging>20.9 in c:\users\ployh\anaconda3\lib\site-packages (from shap) (21.3) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\ployh\anaconda3\lib\site-packages (from packaging>20.9->shap) (3.0.4) Requirement already satisfied: colorama in c:\users\ployh\anaconda3\lib\site-packages (from tqdm>4.25.0->shap) (0.4.4) Requirement already satisfied: setuptools in c:\users\ployh\anaconda3\lib\site-packages (from numba->shap) (61.2.0) Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in c:\users\ployh\anaconda3\lib\site-packages (from numba->shap) (0.38.0) Requirement already satisfied: pytz>=2020.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas->shap) (2021.3) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas->shap) (2.8.2) Requirement already satisfied: six>=1.5 in c:\users\ployh\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->pandas->shap) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn->shap) (2.2.0) Requirement already satisfied: joblib>=0.11 in c:\users\ployh\anaconda3\lib\site-packages (from scikit-learn->shap) (1.1.0)
pip install chart_studio
Collecting chart_studio Downloading chart_studio-1.1.0-py3-none-any.whl (64 kB) Requirement already satisfied: requests in c:\users\ployh\anaconda3\lib\site-packages (from chart_studio) (2.27.1) Collecting retrying>=1.3.3 Downloading retrying-1.3.3.tar.gz (10 kB) Requirement already satisfied: six in c:\users\ployh\anaconda3\lib\site-packages (from chart_studio) (1.16.0) Requirement already satisfied: plotly in c:\users\ployh\anaconda3\lib\site-packages (from chart_studio) (5.6.0) Requirement already satisfied: tenacity>=6.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from plotly->chart_studio) (8.0.1) Requirement already satisfied: certifi>=2017.4.17 in c:\users\ployh\anaconda3\lib\site-packages (from requests->chart_studio) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from requests->chart_studio) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in c:\users\ployh\anaconda3\lib\site-packages (from requests->chart_studio) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\ployh\anaconda3\lib\site-packages (from requests->chart_studio) (1.26.9) Building wheels for collected packages: retrying Building wheel for retrying (setup.py): started Building wheel for retrying (setup.py): finished with status 'done' Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11447 sha256=fad6588b27be5fc88cdfd12b48dd109aee4cd8f98cd37d7a72524c966d1c7381 Stored in directory: c:\users\ployh\appdata\local\pip\cache\wheels\ce\18\7f\e9527e3e66db1456194ac7f61eb3211068c409edceecff2d31 Successfully built retrying Installing collected packages: retrying, chart-studio Successfully installed chart-studio-1.1.0 retrying-1.3.3 Note: you may need to restart the kernel to use updated packages.
pip install cufflinks
Requirement already satisfied: cufflinks in c:\users\ployh\anaconda3\lib\site-packages (0.17.3) Requirement already satisfied: pandas>=0.19.2 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (1.4.2) Requirement already satisfied: ipython>=5.3.0 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (8.2.0) Requirement already satisfied: colorlover>=0.2.1 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (0.3.0) Requirement already satisfied: six>=1.9.0 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (1.16.0) Requirement already satisfied: ipywidgets>=7.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (7.6.5) Requirement already satisfied: setuptools>=34.4.1 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (61.2.0) Requirement already satisfied: numpy>=1.9.2 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (1.21.5) Requirement already satisfied: plotly>=4.1.1 in c:\users\ployh\anaconda3\lib\site-packages (from cufflinks) (5.6.0) Requirement already satisfied: colorama in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.4.4) Requirement already satisfied: pygments>=2.4.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (2.11.2) Requirement already satisfied: stack-data in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.2.0) Requirement already satisfied: backcall in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.2.0) Requirement already satisfied: traitlets>=5 in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (5.1.1) Requirement already satisfied: jedi>=0.16 in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.18.1) Requirement already satisfied: matplotlib-inline in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.1.2) Requirement already satisfied: pickleshare in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (0.7.5) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (3.0.20) Requirement already satisfied: decorator in c:\users\ployh\anaconda3\lib\site-packages (from ipython>=5.3.0->cufflinks) (5.1.1) Requirement already satisfied: ipykernel>=4.5.1 in c:\users\ployh\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->cufflinks) (6.9.1) Requirement already satisfied: widgetsnbextension~=3.5.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->cufflinks) (3.5.2) Requirement already satisfied: jupyterlab-widgets>=1.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->cufflinks) (1.0.0) Requirement already satisfied: nbformat>=4.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->cufflinks) (5.3.0) Requirement already satisfied: ipython-genutils~=0.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->cufflinks) (0.2.0) Requirement already satisfied: jupyter-client<8.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (6.1.12) Requirement already satisfied: nest-asyncio in c:\users\ployh\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (1.5.5) Requirement already satisfied: tornado<7.0,>=4.2 in c:\users\ployh\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (6.1) Requirement already satisfied: debugpy<2.0,>=1.0.0 in c:\users\ployh\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (1.5.1) Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\ployh\anaconda3\lib\site-packages (from jedi>=0.16->ipython>=5.3.0->cufflinks) (0.8.3) Requirement already satisfied: jupyter-core>=4.6.0 in c:\users\ployh\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (4.9.2) Requirement already satisfied: pyzmq>=13 in c:\users\ployh\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (22.3.0) Requirement already satisfied: python-dateutil>=2.1 in c:\users\ployh\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (2.8.2) Requirement already satisfied: pywin32>=1.0 in c:\users\ployh\anaconda3\lib\site-packages (from jupyter-core>=4.6.0->jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets>=7.0.0->cufflinks) (302) Requirement already satisfied: fastjsonschema in c:\users\ployh\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (2.15.1) Requirement already satisfied: jsonschema>=2.6 in c:\users\ployh\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (4.4.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in c:\users\ployh\anaconda3\lib\site-packages (from jsonschema>=2.6->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (0.18.0) Requirement already satisfied: attrs>=17.4.0 in c:\users\ployh\anaconda3\lib\site-packages (from jsonschema>=2.6->nbformat>=4.2.0->ipywidgets>=7.0.0->cufflinks) (21.4.0) Requirement already satisfied: pytz>=2020.1 in c:\users\ployh\anaconda3\lib\site-packages (from pandas>=0.19.2->cufflinks) (2021.3) Requirement already satisfied: tenacity>=6.2.0 in c:\users\ployh\anaconda3\lib\site-packages (from plotly>=4.1.1->cufflinks) (8.0.1) Requirement already satisfied: wcwidth in c:\users\ployh\anaconda3\lib\site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.3.0->cufflinks) (0.2.5) Requirement already satisfied: notebook>=4.4.1 in c:\users\ployh\anaconda3\lib\site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (6.4.8) Requirement already satisfied: terminado>=0.8.3 in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.13.1) Requirement already satisfied: jinja2 in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.11.3) Requirement already satisfied: Send2Trash>=1.8.0 in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.8.0) Requirement already satisfied: nbconvert in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (6.4.4) Requirement already satisfied: prometheus-client in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.13.1) Requirement already satisfied: argon2-cffi in c:\users\ployh\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (21.3.0) Requirement already satisfied: pywinpty>=1.1.0 in c:\users\ployh\anaconda3\lib\site-packages (from terminado>=0.8.3->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.0.2) Requirement already satisfied: argon2-cffi-bindings in c:\users\ployh\anaconda3\lib\site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (21.2.0) Requirement already satisfied: cffi>=1.0.1 in c:\users\ployh\anaconda3\lib\site-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.15.0) Requirement already satisfied: pycparser in c:\users\ployh\anaconda3\lib\site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.21) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\ployh\anaconda3\lib\site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.0.1) Requirement already satisfied: mistune<2,>=0.8.1 in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.8.4) Requirement already satisfied: entrypoints>=0.2.2 in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.4) Requirement already satisfied: bleach in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (4.1.0) Requirement already satisfied: jupyterlab-pygments in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.1.2) Requirement already satisfied: testpath in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.5.0) Requirement already satisfied: defusedxml in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.7.1) Requirement already satisfied: beautifulsoup4 in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (4.11.1) Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.5.13) Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\ployh\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (1.5.0) Requirement already satisfied: soupsieve>1.2 in c:\users\ployh\anaconda3\lib\site-packages (from beautifulsoup4->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (2.3.1) Requirement already satisfied: webencodings in c:\users\ployh\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (0.5.1) Requirement already satisfied: packaging in c:\users\ployh\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (21.3) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\ployh\anaconda3\lib\site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->cufflinks) (3.0.4) Requirement already satisfied: pure-eval in c:\users\ployh\anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0->cufflinks) (0.2.2) Requirement already satisfied: executing in c:\users\ployh\anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0->cufflinks) (0.8.3) Requirement already satisfied: asttokens in c:\users\ployh\anaconda3\lib\site-packages (from stack-data->ipython>=5.3.0->cufflinks) (2.0.5) Note: you may need to restart the kernel to use updated packages.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.stats import norm, skew
from scipy import stats
import statsmodels.api as sm
import matplotlib.ticker as mtic
import seaborn as sns
import essential machine learning
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn import svm, tree, linear_model, neighbors
from sklearn import naive_bayes, ensemble, discriminant_analysis, gaussian_process
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.linear_model import RidgeClassifierCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import f1_score, precision_score, recall_score, fbeta_score
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import KFold
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn import feature_selection
from sklearn import model_selection
from sklearn import metrics
from sklearn.metrics import classification_report, precision_recall_curve
from sklearn.metrics import auc, roc_auc_score, roc_curve
from sklearn.metrics import make_scorer, recall_score, log_loss
from sklearn.metrics import average_precision_score
from sklearn.metrics import plot_confusion_matrix
import seaborn as sns
from matplotlib import pyplot
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import matplotlib
%matplotlib inline
from pylab import rcParams
color = sns.color_palette()
import matplotlib.ticker as mtick
from IPython.display import display
pd.options.display.max_columns = None
from pandas.plotting import scatter_matrix
from sklearn.metrics import roc_curve
from lightgbm import LGBMRegressor, LGBMClassifier, Booster
init_func = LGBMRegressor
import plotly
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as py
from plotly.offline import iplot
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import shap
import cufflinks as cf
import warnings
warnings.filterwarnings("ignore")
import random
import os
import re
import sys
import timeit
import string
import time
from datetime import datetime
from time import time
from dateutil.parser import parse
import joblib
import warnings
warnings.filterwarnings("ignore")
pip install xgboost
Requirement already satisfied: xgboost in c:\users\ployh\anaconda3\lib\site-packages (1.6.1) Requirement already satisfied: scipy in c:\users\ployh\anaconda3\lib\site-packages (from xgboost) (1.7.3) Requirement already satisfied: numpy in c:\users\ployh\anaconda3\lib\site-packages (from xgboost) (1.21.5) Note: you may need to restart the kernel to use updated packages.
data = pd.read_csv('C:\\Users\\ployh\\OneDrive\\Desktop\\Data .csv')
data.head()
| customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
| 1 | 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.5 | No |
| 2 | 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
| 3 | 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
| 4 | 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
data.columns
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
dtype='object')
data.describe()
| SeniorCitizen | tenure | MonthlyCharges | |
|---|---|---|---|
| count | 7043.000000 | 7043.000000 | 7043.000000 |
| mean | 0.162147 | 32.371149 | 64.761692 |
| std | 0.368612 | 24.559481 | 30.090047 |
| min | 0.000000 | 0.000000 | 18.250000 |
| 25% | 0.000000 | 9.000000 | 35.500000 |
| 50% | 0.000000 | 29.000000 | 70.350000 |
| 75% | 0.000000 | 55.000000 | 89.850000 |
| max | 1.000000 | 72.000000 | 118.750000 |
data.dtypes
customerID object gender object SeniorCitizen int64 Partner object Dependents object tenure int64 PhoneService object MultipleLines object InternetService object OnlineSecurity object OnlineBackup object DeviceProtection object TechSupport object StreamingTV object StreamingMovies object Contract object PaperlessBilling object PaymentMethod object MonthlyCharges float64 TotalCharges object Churn object dtype: object
data.columns.to_series().groupby(data.dtypes).groups
{int64: ['SeniorCitizen', 'tenure'], float64: ['MonthlyCharges'], object: ['customerID', 'gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'TotalCharges', 'Churn']}
data.isna().any()
customerID False gender False SeniorCitizen False Partner False Dependents False tenure False PhoneService False MultipleLines False InternetService False OnlineSecurity False OnlineBackup False DeviceProtection False TechSupport False StreamingTV False StreamingMovies False Contract False PaperlessBilling False PaymentMethod False MonthlyCharges False TotalCharges False Churn False dtype: bool
Unique values within every category variable
data["PaymentMethod"].nunique()
data["PaymentMethod"].unique()
data["Contract"].nunique()
data["Contract"].unique()
array(['Month-to-month', 'One year', 'Two year'], dtype=object)
Check distribution of variable of interest
data["Churn"].value_counts()
No 5174 Yes 1869 Name: Churn, dtype: int64
The dataset is not balance and 1869 customer is likely to leave the company
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'],errors='coerce')
data['TotalCharges'] = data['TotalCharges'].astype("float")
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 customerID 7043 non-null object 1 gender 7043 non-null object 2 SeniorCitizen 7043 non-null int64 3 Partner 7043 non-null object 4 Dependents 7043 non-null object 5 tenure 7043 non-null int64 6 PhoneService 7043 non-null object 7 MultipleLines 7043 non-null object 8 InternetService 7043 non-null object 9 OnlineSecurity 7043 non-null object 10 OnlineBackup 7043 non-null object 11 DeviceProtection 7043 non-null object 12 TechSupport 7043 non-null object 13 StreamingTV 7043 non-null object 14 StreamingMovies 7043 non-null object 15 Contract 7043 non-null object 16 PaperlessBilling 7043 non-null object 17 PaymentMethod 7043 non-null object 18 MonthlyCharges 7043 non-null float64 19 TotalCharges 7032 non-null float64 20 Churn 7043 non-null object dtypes: float64(2), int64(2), object(17) memory usage: 1.1+ MB
There is some missing value in Total Charge. So we will fillna in the further step
data.isna().any()
customerID False gender False SeniorCitizen False Partner False Dependents False tenure False PhoneService False MultipleLines False InternetService False OnlineSecurity False OnlineBackup False DeviceProtection False TechSupport False StreamingTV False StreamingMovies False Contract False PaperlessBilling False PaymentMethod False MonthlyCharges False TotalCharges True Churn False dtype: bool
fillna = data.isna().any()
fillna = fillna[fillna == True].reset_index()
fillna = fillna["index"].tolist()
for col in data.columns[1:]:
if col in fillna:
if data[col].dtype != 'object':
data[col] = data[col].fillna(data[col].mean()).round(0)
data.isna().any()
customerID False gender False SeniorCitizen False Partner False Dependents False tenure False PhoneService False MultipleLines False InternetService False OnlineSecurity False OnlineBackup False DeviceProtection False TechSupport False StreamingTV False StreamingMovies False Contract False PaperlessBilling False PaymentMethod False MonthlyCharges False TotalCharges False Churn False dtype: bool
drop unneccessary column, which is customer id
data2 = data.drop('customerID', axis=1).copy()
data2.head()
| gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 30.0 | No |
| 1 | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1890.0 | No |
| 2 | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.0 | Yes |
| 3 | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1841.0 | No |
| 4 | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 152.0 | Yes |
data2['TotalCharges'][3826]
2283.0
data2.iloc[3826]
gender Male SeniorCitizen 0 Partner Yes Dependents Yes tenure 0 PhoneService Yes MultipleLines Yes InternetService No OnlineSecurity No internet service OnlineBackup No internet service DeviceProtection No internet service TechSupport No internet service StreamingTV No internet service StreamingMovies No internet service Contract Two year PaperlessBilling No PaymentMethod Mailed check MonthlyCharges 25.35 TotalCharges 2283.0 Churn No Name: 3826, dtype: object
data2['TotalCharges']= data2['TotalCharges'].apply(lambda x: x if x!= ' ' else np.nan).astype(float)
data2.head()
| gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 30.0 | No |
| 1 | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1890.0 | No |
| 2 | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.0 | Yes |
| 3 | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1841.0 | No |
| 4 | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 152.0 | Yes |
Encoding part
lebel = LabelEncoder()
data2['Churn']=lebel.fit_transform(data2['Churn'])
numeric= data2.select_dtypes('number').columns
category = data2.select_dtypes('object').columns
matrix = np.triu(data2[numeric].corr())
fig, ax = plt.subplots(figsize=(14,10))
sns.heatmap (data2[numeric].corr(), annot=True, cmap='viridis',mask=matrix, ax=ax)
<AxesSubplot:>
Categorical Features
data2[category].nunique()
gender 2 Partner 2 Dependents 2 PhoneService 2 MultipleLines 3 InternetService 3 OnlineSecurity 3 OnlineBackup 3 DeviceProtection 3 TechSupport 3 StreamingTV 3 StreamingMovies 3 Contract 3 PaperlessBilling 2 PaymentMethod 4 dtype: int64
for feature in data2[category]:
print(data2)
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
gender SeniorCitizen Partner Dependents tenure PhoneService \
0 Female 0 Yes No 1 No
1 Male 0 No No 34 Yes
2 Male 0 No No 2 Yes
3 Male 0 No No 45 No
4 Female 0 No No 2 Yes
... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes
7039 Female 0 Yes Yes 72 Yes
7040 Female 0 Yes Yes 11 No
7041 Male 1 Yes No 4 Yes
7042 Male 0 No No 66 Yes
MultipleLines InternetService OnlineSecurity OnlineBackup \
0 No phone service DSL No Yes
1 No DSL Yes No
2 No DSL Yes Yes
3 No phone service DSL Yes No
4 No Fiber optic No No
... ... ... ... ...
7038 Yes DSL Yes No
7039 Yes Fiber optic No Yes
7040 No phone service DSL Yes No
7041 Yes Fiber optic No No
7042 No Fiber optic Yes No
DeviceProtection TechSupport StreamingTV StreamingMovies Contract \
0 No No No No Month-to-month
1 Yes No No No One year
2 No No No No Month-to-month
3 Yes Yes No No One year
4 No No No No Month-to-month
... ... ... ... ... ...
7038 Yes Yes Yes Yes One year
7039 Yes No Yes Yes One year
7040 No No No No Month-to-month
7041 No No No No Month-to-month
7042 Yes Yes Yes Yes Two year
PaperlessBilling PaymentMethod MonthlyCharges \
0 Yes Electronic check 29.85
1 No Mailed check 56.95
2 Yes Mailed check 53.85
3 No Bank transfer (automatic) 42.30
4 Yes Electronic check 70.70
... ... ... ...
7038 Yes Mailed check 84.80
7039 Yes Credit card (automatic) 103.20
7040 Yes Electronic check 29.60
7041 Yes Mailed check 74.40
7042 Yes Bank transfer (automatic) 105.65
TotalCharges Churn
0 30.0 0
1 1890.0 0
2 108.0 1
3 1841.0 0
4 152.0 1
... ... ...
7038 1990.0 0
7039 7363.0 0
7040 346.0 0
7041 307.0 1
7042 6844.0 0
[7043 rows x 20 columns]
data2['MultipleLines']= data2['MultipleLines'].replace('No phone service','No')
data2[['OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']]= data2[['OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies']].replace('No internet service','No')
fig = px.histogram(data2, x="gender", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
There is little variation between male and female churn rates.
fig = px.histogram(data2, x="Partner", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
Single customers are nearly 1.7 times as likely to leave than those with a partner.
fig = px.histogram(data2, x="Dependents", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
Customers without dependents are roughly 2.03 times more likely to leave than those with dependents.
fig = px.histogram(data2, x="PhoneService", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
The difference in churn rate between customers who have home phone service with the provider and those who do not is negligible.
fig = px.histogram(data2, x="MultipleLines", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
The difference in churn rate between customers who have multiple lines of phone service with the provider and those who do not is minimal.
fig = px.histogram(data2, x="InternetService", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
Those with fibre optic internet connection are 5.66 times more likely to churn than customers without internet service.
fig = px.histogram(data2, x="OnlineSecurity", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A client with an online security service provided by the firm is nearly 2.14 times less likely to abandon the company than a consumer without such a service.
fig = px.histogram(data2, x="DeviceProtection", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A client with a device protection service from the firm is about 1.27 times less likely to depart than a customer without such a service.
fig = px.histogram(data2, x="TechSupport", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A customer with a Tech Support service with the company almost 2.06 times less likely to leave the company than a customer without any a Tech Support service with the company.
fig = px.histogram(data2, x="StreamingTV", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A client with a Streaming TV service from the firm is 1.24 times more likely to depart than a customer without such a service.
fig = px.histogram(data2, x="StreamingMovies", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A client with a Streaming Movies service from the firm is about 1.23 times more likely to depart than a customer without a Streaming movies service from the company.
fig = px.histogram(data2, x="Contract", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
On the basis of the contract, the histogram and mean differences exhibited significant discrepancies.Customers with a two-year contract are about 15,1 times less likely to churn than those with a monthly plan. In contrast, customers with annual contracts are 3.79 times less likely to churn than those with monthly contracts.
fig = px.histogram(data2, x="PaperlessBilling", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
A consumer who receives paperless invoicing from the organisation is nearly 2.06 times more likely to depart than a customer who does not.
fig = px.histogram(data2, x="PaymentMethod", color="Churn",width=500, height=600, color_discrete_map={
0: '#553a99',
1: '#d52685'
})
fig.show()
Almost half of clients whose payment option is an Electronic Check abandon their purchases.
X= data2.drop('Churn', axis=1)
y= data2['Churn']
categorical_features_indices = np.where(X.dtypes != np.float)[0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
The model that i chose to employ is catboost classification. Catboost came from Cat = Category and Boost from Boosting. CatBoost uses gradient enhanced decision trees as its foundation. During training, successive decision trees are constructed. Each succeeding tree is constructed with less loss than its predecessors. The quantity of trees is determined by the initial settings.
in this step, we mainly focus on recall score. The recall is the ratio true positive / (true positive + fault negative) where true positive is the number of true positives and fn the number of false negatives. The recall is, obviously, the classifier's capacity to identify all positive samples.
accuracy= []
recall =[]
roc_auc= []
precision = []
model_names =[]
catboost_base = CatBoostClassifier(verbose=False,random_state=0)
catboost_base.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_test, y_test))
y_pred = catboost_base.predict(X_test)
accuracy.append(round(accuracy_score(y_test, y_pred),4))
recall.append(round(recall_score(y_test, y_pred),4))
roc_auc.append(round(roc_auc_score(y_test, y_pred),4))
precision.append(round(precision_score(y_test, y_pred),4))
model_names = ['Catboost']
result_cat = pd.DataFrame({'Accuracy':accuracy,'Recall':recall, 'Roc_Auc':roc_auc, 'Precision':precision}, index=model_names)
result_cat
| Accuracy | Recall | Roc_Auc | Precision | |
|---|---|---|---|---|
| Catboost | 0.8012 | 0.5105 | 0.7101 | 0.6782 |
fig, ax = plt.subplots(figsize=(5, 5))
plot_confusion_matrix(catboost_base, X_test, y_test, cmap=plt.cm.Greens, ax=ax);
accuracy= []
recall =[]
roc_auc= []
precision = []
model_names =[]
catboost_scale3 = CatBoostClassifier(verbose=False,random_state=0, scale_pos_weight=3)
catboost_scale3.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_test, y_test))
y_pred = catboost_scale3.predict(X_test)
accuracy.append(round(accuracy_score(y_test, y_pred),4))
recall.append(round(recall_score(y_test, y_pred),4))
roc_auc.append(round(roc_auc_score(y_test, y_pred),4))
precision.append(round(precision_score(y_test, y_pred),4))
model_names = ['Catboost_scale3']
result_cat3 = pd.DataFrame({'Accuracy':accuracy,'Recall':recall, 'Roc_Auc':roc_auc, 'Precision':precision}, index=model_names)
result_cat3
| Accuracy | Recall | Roc_Auc | Precision | |
|---|---|---|---|---|
| Catboost_scale3 | 0.7582 | 0.8345 | 0.7821 | 0.5352 |
fig, ax = plt.subplots(figsize=(5, 5))
plot_confusion_matrix(catboost_scale3, X_test, y_test, cmap=plt.cm.Greens, ax=ax);
accuracy= []
recall =[]
roc_auc= []
precision = []
model_names =[]
catboost_scale5 = CatBoostClassifier(verbose=False,random_state=0, scale_pos_weight=5)
catboost_scale5.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_test, y_test))
y_pred = catboost_scale5.predict(X_test)
accuracy.append(round(accuracy_score(y_test, y_pred),4))
recall.append(round(recall_score(y_test, y_pred),4))
roc_auc.append(round(roc_auc_score(y_test, y_pred),4))
precision.append(round(precision_score(y_test, y_pred),4))
model_names = ['Catboost_scale5']
result_cat5 = pd.DataFrame({'Accuracy':accuracy,'Recall':recall, 'Roc_Auc':roc_auc, 'Precision':precision}, index=model_names)
result_cat5
| Accuracy | Recall | Roc_Auc | Precision | |
|---|---|---|---|---|
| Catboost_scale5 | 0.6947 | 0.9111 | 0.7626 | 0.4682 |
fig, ax = plt.subplots(figsize=(5, 5))
plot_confusion_matrix(catboost_scale5, X_test, y_test, cmap=plt.cm.Greens, ax=ax);
result_catboost= pd.concat([result_cat,result_cat3,result_cat5],axis=0)
result_catboost
| Accuracy | Recall | Roc_Auc | Precision | |
|---|---|---|---|---|
| Catboost | 0.8012 | 0.5105 | 0.7101 | 0.6782 |
| Catboost_scale3 | 0.7582 | 0.8345 | 0.7821 | 0.5352 |
| Catboost_scale5 | 0.6947 | 0.9111 | 0.7626 | 0.4682 |
result_catboost.sort_values(by=['Recall'], ascending=True,inplace=True)
fig = px.bar(result_catboost, x='Recall', y=result_catboost.index, color_discrete_sequence=px.colors.qualitative.Bold,title='Catboost Model Comparison',height=600,labels={'index':'MODELS'})
fig.show()
Base on Recall catboost with adjust scale at 5 have the hightest in recall score. so i employed catboost model with adjust scale 5 in SHAP for developing model in the following step
we building model from catboost classification by SHAP
explainercat = shap.TreeExplainer(catboost_scale5)
shap_values_cat_test = explainercat.shap_values(X_test)
shap_values_cat_train = explainercat.shap_values(X_train)
shap.summary_plot(shap_values_cat_train, X_train, plot_type="bar")
as you can see from the barchart, the top 5 reason that customer likely to churn are Contract, Internet service, tenure, Payment method and Paperless Billing
As demonstrated by the above model, the type of contract is the leading cause of customer churn, with month-to-month contracts having the highest proportion compared to other contract types. I believe we should give an unique deal, such as unlimited calling for the first two months, in order to retain their business.
To retain more customers, we should give a deal to attract new ones, such as a free Netflix subscription for the first three months with the purchase of an internet plan. In my opinion, customer service is crucial; if a business develops good customer service, it will be simple to attract new customers. Additionally, they will remain with the organisation longer.